Search CORE

36 research outputs found

Emerging multidisciplinary research across database management systems

Author: Nica Anisoara
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2010
Field of study

The database community is exploring more and more multidisciplinary avenues: Data semantics overlaps with ontology management; reasoning tasks venture into the domain of artificial intelligence; and data stream management and information retrieval shake hands, e.g., when processing Web click-streams. These new research avenues become evident, for example, in the topics that doctoral students choose for their dissertations. This paper surveys the emerging multidisciplinary research by doctoral students in database systems and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D. workshop at the International Conference on Information and Knowledge Management (CIKM). The topics addressed include ontology development, data streams, natural language processing, medical databases, green energy, cloud computing, and exploratory search. In addition to core ideas from the workshop, we list some open research questions in these multidisciplinary areas

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

Montclair State University Digital Commons

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Emerging multidisciplinary research across database management systems

Author: Nica Anisoara
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2000
Field of study

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

CERN Document Server

Dissertations of the University of Groningen

Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins

Author: Fetzer Christof
Hackenbroich Gregor
Jerzak Zbigniew
Ji Yuanzhen
Nica Anisoara
Sun Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2017
Field of study

Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201

arXiv.org e-Print Archive

Crossref

FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

Author: Lasperas Johan
Lehner Wolfgang
Nica Anisoara
Oukid Ismail
Willhalm Thomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2022
Field of study

The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

SPA: Economical and workload-driven indexing for data analytics in the cloud

Author: Boncz P.A. (Peter)
Chronis Y. (Yannis)
Finis J. (Jan)
Halfpap S. (Stefan)
Leis V. (Viktor)
Neumann T. (Thomas)
Nica A. (Anisoara)
Sauer C. (Caetano)
Stolze K. (Knut)
Zukowski M. (Marcin)
Publication venue
Publication date: 26/07/2023
Field of study

Selective queries are not uncommon in large-scale data analytics, for example, when drilling down into a specific customer in a dashboard. Traditionally, selective queries are accelerated by creating secondary indexes. However, because of their large size, expensive maintenance, and difficulty to tune and automate, indexes are typically not used in modern cloud data warehouses or data lakes. Instead, such systems rely mostly on full table scans and lightweight optimizations like min/max filtering, whose effectiveness depends heavily on the data layout and value distributions.We propose SPA as the vision for automatically optimizing selective queries for immutable copy-on-write data formats. SPA adaptively indexes subsets of the data in an incremental and workload-driven manner. It makes fine-grained decisions and continuously monitors their benefit, dynamically allocating an optimization budget in a way that bounds the additional cost of indexing. Furthermore, it guarantees a performance improvement in the cases where indexes - potentially partial ones - prove to be beneficial. When indexes lose their benefit due to a shifting workload, they are gradually deconstructed in favor of optimizations that accommodate recent trends. As SPA does not require information about updates performed on the data, it can also be employed as an accelerator for systems that do not control the data, e.g., in cloud data lakes

CWI's Institutional Repository

View evolution support for information integration systems over dynamic distributed information spaces.

Author: Nica Anisoara
Publication venue
Publication date: 01/01/1998
Field of study

Challenging issues for creating and maintaining tailored information gathering systems over large-scale information spaces (e.g., Digital Libraries, the World Wide Web) include the diversity of the information sources (ISs) in terms of their structures, query interfaces and search engines, as well as the dynamics of sources continuously being added, removed or upgraded. Current information integration systems are often based on static apriori defined views that gather information from heterogeneous information sources and provide the user with a uniform view of the information space for browsing and querying. This dissertation addresses one of the largely unexplored issues that such information integration systems raise, namely, the evolution and maintenance of data warehouses when the underlying information sources change their capabilities, i.e., schema level changes. The overall solution approach that this dissertation puts forth consists of defining the problem of view evolution triggered by capability changes of ISs and designing evolution algorithms that achieve synchronization of the affected views in the presence of these types of changes. The contributions made by this dissertation include (1) an extension of the SQL view definition language that allows the user to apriori specify evolution preferences, e.g., whether dropping or changing a view component is acceptable; (2) a formal definition of what is a legal view rewriting under capability changes, i.e., semantics of view evolution; (3) algorithms for view synchronization that find a modified view definition in response to a capability change of an IS; (4) algorithms for the maintenance of materialized views after the view synchronization process; (5) experimental evaluations comparing the maintenance strategies after view synchronization with alternative maintenance techniques; and (6) the development of an working system incorporating some of the proposed evolution algorithms.Ph.D.Applied SciencesComputer scienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/131740/2/9929909.pd

Deep Blue Documents at the University of Michigan

Using Complex Substitution Strategies for View Synchronization

Author: Anisoara Nica
Anisoara Nica
Elke A. Rundensteiner
Elke A. Rundensteiner
Publication venue
Publication date
Field of study

Abstract Large-scale information systems typically contain autonomous information sources (ISs) that dynamically modify their content, interfaces as well as their query services regardless of the data warehouses (views) that are built on top of them. Current view technology fails to provide adaptation techniques for such changes giving support to only static views in the sense that views become undefined when ISs undergo capability changes. We propose to address this new view evolution problem - which we call view synchronization - by allowing view definitions to be dynamically evolved when they become undefined. The foundations of our approach to view synchronization include: the EvolvableSQL view definition language (E-SQL), the model for information source description (MISD), and the concept of legal view rewritings. In this paper, we now introduce the concept of the strongest synch-equivalent view definition that explicitly defines the evolution semantics associated with an E-SQL ..

CiteSeerX

PIKM 2010ACM Workshop for Ph.D. Students in Information and Knowledge Management

Author: Nica Anisoara
Varde Aparna
Publication venue: Montclair State University Digital Commons
Publication date: 01/12/2010
Field of study

The PIKM workshop focuses on papers consisting mainly of the Ph.D. dissertation proposals of doctoral students. A wide range of topics on any area in databases, information retrieval and knowledge management are presented at this workshop. The areas of interest are similar to those at the CIKM main conference in the three respective tracks. Interdisciplinary work across these tracks is encouraged

Montclair State University Digital Commons

Real-Time Networking over HIPPI

Author: Anisoara Nica
Riccardo Bettati
Publication venue
Publication date
Field of study

HIPPI provides a very-high-speed communication medium, which is very well suited for a large number of bandwidth-demanding distributed applications. Unfortunately, its circuit-switched nature makes it very difficult to provide real-time guarantees when connections contend for network resources. We present a time-division-multiplex access scheme designed to give timing guarantees to high-speed connections. We describe the problem of scheduling the access to a HIPPI network, and show that, although the problem is very unlikely to be computationally tractable, very simple heuristics give high network utilizations for moderately-sized networks. We present the RMP/RMCP protocol, our implementation of the scheme described in this paper on the XUNET-West HIPPI testbed. 1 Introduction A large number of applications in distributed control, distributed virtual reality, and remote laboratoring demand for hard delay guarantees in order to satisfy the timing requirements of their time-critical com..

CiteSeerX